Utility Theory

How We Made Decision and Behave Dan Gilbert

$$ \begin{aligned} \mathbb{E}\left[ \text{Gain} \right] &= \sum_{gain} P[\text{gain}] \cdot U(\text{gain}) \\ \text{Expected Gain} &= \sum (\text{Odds of Gain}) \times (\text{Value of Gain}) \end{aligned} $$

But people did two things:

People usually think: "What space is to size, time is to value." -- Plato

Utility Function / Value Function

$$ U : S \mapsto \mathbb{R} $$

Optimal Behavior

$$ \max_{a \in A} \sum_{s \in S} P[s|a]U(s) $$

Problem

Utility is not natural. Preferences is!

Theorem: Given a plausible set of assumptions about your preferences, there must exist a consistent utility function.

Reference: Utility Theory

Preference Axioms

Suppose a preference relation $\succeq$ satisfies the axioms. Then there exists a utility function $u$ that could represent $\succeq$.

Relation: $$ \begin{aligned} A \succeq B \Leftrightarrow U(A) \geq U(B) \\ A \succ B \Leftrightarrow U(A) > U(B) \\ A \sim B \Leftrightarrow U(A) = U(B) \end{aligned} $$ Lottery: $$ U([p_1: O_1, \dots , p_k:O_k]) = \sum_{i=1}^{k}p_iU(O_i) $$

Value of Information

Expected utility of action $a$ with evidence $E$: $$ \mathbb{E}\left[ U_{E}(A|e) \right] = \max_{a\in A} \sum_{i} P[S_i|e, a] \cdot U(S_i) $$ Expected utility given new evidence $E'$: $$ \mathbb{E}\left[ U_{E, E'}(A|e, e') \right] = \max_{a\in A} \sum_{i} P[S_i|e, e', a] \cdot U(S_i) $$ Value of knowing $E’​$(Value of Perfect Information, VPI): $$ \begin{aligned} \text{VPI}_{E}(E') &= \mathbb{E}\left[ U_{E,E'}(A'|e,E')\right] - \mathbb{E}\left[ U_{E}(A|e) \right] \\ &= \left(\sum_{e'} P[e'|e] \cdot \mathbb{E}\left[ U_{E,E'}(A'|e,e')\right] \right) - \mathbb{E}\left[ U_{E}(A|e) \right] \\ &= \text{Expected Utility Given New Information} - \text{Previous Expected Utility} \end{aligned} $$ Utility means: The best expected profit next. It’s irrelevant to action, only relevant to action space.

Example (Time = Cost = -Utility)

In practice, information is partial(what we observed cannot cover the exact state of the world) and imperfect(what we observed might not be reliable).

VPI tells you how much you should pay for one exact piece of information. However this might be myopic. For example only knowing one part of the story may be useless when only knowing all is useful.

Conclusion

Decision theory provides a framework for optimal decision making.

The principle is: maximizing expected utility.

by Jon